AITopics | guidance image

Collaborating Authors

guidance image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptive Guided Upsampling for Low-light Image Enhancement

Dcosta, Angela Vivian, Song, Chunbo, Radkowski, Rafael

arXiv.org Artificial IntelligenceNov-21-2025

We introduce Adaptive Guided Upsampling (AGU), an efficient method for upscaling low-light images capable of optimizing multiple image quality characteristics at the same time, such as reducing noise and increasing sharpness. It is based on a guided image method, which transfers image characteristics from a guidance image to the target image. Using state-of-the-art guided methods, low-light images lack sufficient characteristics for this purpose due to their high noise level and low brightness, rendering suboptimal/not significantly improved images in the process. We solve this problem with multi-parameter optimization, learning the association between multiple low-light and bright image characteristics. Our proposed machine learning method learns these characteristics from a few sample images-pairs. AGU can render high-quality images in real time using low-quality, low-resolution input; our experiments demonstrate that it is superior to state-of-the-art methods in the addressed low-light use case.

artificial intelligence, machine learning, sharpness, (18 more...)

arXiv.org Artificial Intelligence

2511.16623

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Optimisation-Based Multi-Modal Semantic Image Editing

Li, Bowen, Yang, Yongxin, McDonagh, Steven, Zhang, Shifeng, Tudosiu, Petru-Daniel, Parisot, Sarah

arXiv.org Artificial IntelligenceNov-28-2023

Image editing affords increased control over the aesthetics and content of generated images. Pre-existing works focus predominantly on text-based instructions to achieve desired image modifications, which limit edit precision and accuracy. In this work, we propose an inference-time editing optimisation, designed to extend beyond textual edits to accommodate multiple editing instruction types (e.g. spatial layout-based; pose, scribbles, edge maps). We propose to disentangle the editing task into two competing subtasks: successful local image modifications and global content consistency preservation, where subtasks are guided through two dedicated loss functions. By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences. We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits, through both qualitative and quantitative experiments.

diffusion model, editing, modification, (14 more...)

arXiv.org Artificial Intelligence

2311.16882

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.46)

Industry: Media > Photography (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

NeuBTF: Neural fields for BTF encoding and transfer

Rodriguez-Pardo, Carlos, Kazatzis, Konstantinos, Lopez-Moreno, Jorge, Garces, Elena

arXiv.org Artificial IntelligenceJul-3-2023

Neural material representations are becoming a popular way to represent materials for rendering. They are more expressive than analytic models and occupy less memory than tabulated BTFs. However, existing neural materials are immutable, meaning that their output for a certain query of UVs, camera, and light vector is fixed once they are trained. While this is practical when there is no need to edit the material, it can become very limiting when the fragment of the material used for training is too small or not tileable, which frequently happens when the material has been captured with a gonioreflectometer. In this paper, we propose a novel neural material representation which jointly tackles the problems of BTF compression, tiling, and extrapolation. At test time, our method uses a guidance image as input to condition the neural BTF to the structural features of this input image. Then, the neural BTF can be queried as a regular BTF using UVs, camera, and light vectors. Every component in our framework is purposefully designed to maximize BTF encoding quality at minimal parameter count and computational complexity, achieving competitive compression rates compared with previous work. We demonstrate the results of our method on a variety of synthetic and captured materials, showing its generality and capacity to learn to represent many optical properties.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cag.2023.06.018

2307.01199

Country:

Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.88)

Add feedback

Guided Deep Generative Model-based Spatial Regularization for Multiband Imaging Inverse Problems

Zhao, Min, Dobigeon, Nicolas, Chen, Jie

arXiv.org Artificial IntelligenceJun-28-2023

When adopting a model-based formulation, solving inverse problems encountered in multiband imaging requires to define spatial and spectral regularizations. In most of the works of the literature, spectral information is extracted from the observations directly to derive data-driven spectral priors. Conversely, the choice of the spatial regularization often boils down to the use of conventional penalizations (e.g., total variation) promoting expected features of the reconstructed image (e.g., piecewise constant). In this work, we propose a generic framework able to capitalize on an auxiliary acquisition of high spatial resolution to derive tailored data-driven spatial regularizations. This approach leverages on the ability of deep learning to extract high level features. More precisely, the regularization is conceived as a deep generative network able to encode spatial semantic features contained in this auxiliary image of high spatial resolution. To illustrate the versatility of this approach, it is instantiated to conduct two particular tasks, namely multiband image fusion and multiband image inpainting. Experimental results obtained on these two tasks demonstrate the benefit of this class of informed regularizations when compared to more conventional ones.

artificial intelligence, machine learning, regularization, (19 more...)

arXiv.org Artificial Intelligence

2306.17197

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Add feedback

Simple and Effective Synthesis of Indoor 3D Scenes

Koh, Jing Yu, Agrawal, Harsh, Batra, Dhruv, Tucker, Richard, Waters, Austin, Lee, Honglak, Yang, Yinfei, Baldridge, Jason, Anderson, Peter

arXiv.org Artificial IntelligenceDec-1-2022

We study the problem of synthesizing immersive 3D indoor scenes from one or more images. Our aim is to generate high-resolution images and videos from novel viewpoints, including viewpoints that extrapolate far beyond the input images while maintaining 3D consistency. Existing approaches are highly complex, with many separately trained stages and components. We propose a simple alternative: an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images. On the Matterport3D and RealEstate10K datasets, our approach significantly outperforms prior work when evaluated by humans, as well as on FID scores. Further, we show that our model is useful for generative data augmentation. A vision-and-language navigation (VLN) agent trained with trajectories spatially-perturbed by our model improves success rate by up to 1.5% over a state of the art baseline on the R2R benchmark. Our code will be made available to facilitate generative data augmentation and applications to downstream robotics and embodied AI tasks.

artificial intelligence, machine learning, pathdreamer, (19 more...)

arXiv.org Artificial Intelligence

2204.0296

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Michigan (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Tumanyan, Narek, Geyer, Michal, Bagon, Shai, Dekel, Tali

arXiv.org Artificial IntelligenceNov-22-2022

Large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts. However, a pivotal challenge in leveraging such models for real-world content creation tasks is providing users with control over the generated content. In this paper, we present a new framework that takes text-to-image synthesis to the realm of image-to-image translation -- given a guidance image and a target text prompt, our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text, while preserving the semantic layout of the source image. Specifically, we observe and empirically demonstrate that fine-grained control over the generated structure can be achieved by manipulating spatial features and their self-attention inside the model. This results in a simple and effective approach, where features extracted from the guidance image are directly injected into the generation process of the target image, requiring no training or fine-tuning and applicable for both real or generated guidance images. We demonstrate high-quality results on versatile text-guided image translation tasks, including translating sketches, rough drawings and animations into realistic images, changing of the class and appearance of objects in a given image, and modifications of global qualities such as lighting and color.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.12572

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Gaze Estimation Approach Using Deep Differential Residual Network

Huang, Longzhao, Li, Yujie, Wang, Xu, Wang, Haoyu, Bouridane, Ahmed, Chaddad, Ahmad

arXiv.org Artificial IntelligenceAug-8-2022

Gaze estimation, which is a method to determine where a person is looking at given the person's full face, is a valuable clue for understanding human intention. Similarly to other domains of computer vision, deep learning (DL) methods have gained recognition in the gaze estimation domain. However, there are still gaze calibration problems in the gaze estimation domain, thus preventing existing methods from further improving the performances. An effective solution is to directly predict the difference information of two human eyes, such as the differential network (Diff-Nn). However, this solution results in a loss of accuracy when using only one inference image. We propose a differential residual model (DRNet) combined with a new loss function to make use of the difference information of two eye images. We treat the difference information as auxiliary information. We assess the proposed model (DRNet) mainly using two public datasets (1) MpiiGaze and (2) Eyediap. Considering only the eye features, DRNet outperforms the state-of-the-art gaze estimation methods with $angular-error$ of 4.57 and 6.14 using MpiiGaze and Eyediap datasets, respectively. Furthermore, the experimental results also demonstrate that DRNet is extremely robust to noise images.

angular error, estimation, information, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/s22145462

2208.04298

Country:

Asia > China (0.05)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > UAE > Sharjah Emirate > Sharjah (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging in-domain supervision for unsupervised image-to-image translation tasks via multi-stream generators

Yerushalmi, Dvir, Danon, Dov, Bermano, Amit H.

arXiv.org Artificial IntelligenceDec-30-2021

Supervision for image-to-image translation (I2I) tasks is hard to come by, but bears significant effect on the resulting quality. In this paper, we observe that for many Unsupervised I2I (UI2I) scenarios, one domain is more familiar than the other, and offers in-domain prior knowledge, such as semantic segmentation. We argue that for complex scenes, figuring out the semantic structure of the domain is hard, especially with no supervision, but is an important part of a successful I2I operation. We hence introduce two techniques to incorporate this invaluable in-domain prior knowledge for the benefit of translation quality: through a novel Multi-Stream generator architecture, and through a semantic segmentation-based regularization loss term. In essence, we propose splitting the input data according to semantic masks, explicitly guiding the network to different behavior for the different regions of the image. In addition, we propose training a semantic segmentation network along with the translation task, and to leverage this output as a loss term that improves robustness. We validate our approach on urban data, demonstrating superior quality in the challenging UI2I tasks of converting day images to night ones. In addition, we also demonstrate how reinforcing the target dataset with our augmented images improves the training of downstream tasks such as the classical detection one.

generator, proceedings, translation, (13 more...)

arXiv.org Artificial Intelligence

2112.15091

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Photometry-guided Visual Attribute Transfer

Rodriguez-Pardo, Carlos, Garces, Elena

arXiv.org Artificial IntelligenceDec-5-2021

We present a deep learning-based method for propagating spatially-varying visual material attributes (e.g. texture maps or image stylizations) to larger samples of the same or similar materials. For training, we leverage images of the material taken under multiple illuminations and a dedicated data augmentation policy, making the transfer robust to novel illumination conditions and affine deformations. Our model relies on a supervised image-to-image translation framework and is agnostic to the transferred domain; we showcase a semantic segmentation, a normal map, and a stylization. Following an image analogies approach, the method only requires the training data to contain the same visual structures as the input guidance. Our approach works at interactive rates, making it suitable for material edit applications. We thoroughly evaluate our learning methodology in a controlled setup providing quantitative measures of performance. Last, we demonstrate that training the model on a single material is enough to generalize to materials of the same type without the need for massive datasets.

dataset, guidance image, transaction, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TVCG.2021.3133081

2112.0252

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain > Galicia > Madrid (0.05)
North America > United States (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Media (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NVIDIA Novel vid2vid Framework Synthesizes Realistic World-Consistent Videos

#artificialintelligenceJul-20-2020, 11:25:08 GMT

A problem with video to video synthesis (vid2vid) is the technique's forgetfulness. Take the 2016 "Mannequin Challenge" viral video trend as an example: people remain frozen while a camera passes through them to capture the scene. Viewers would naturally be confused if the camera view returned to a previously captured person but their face appeared totally different -- as if they were wearing a magically transformative mask. This phenomenon has plagued vid2vid methods that generate video frames based only the information available in the immediately preceding frame(s). For AI researchers, achieving vid2vid temporal consistency over the entire rendered 3D world is a challenge.

artificial intelligence, framework synthesize realistic world-consistent video, nvidia novel vid2vid, (5 more...)

#artificialintelligence

Industry: Information Technology > Hardware (0.43)

Technology: Information Technology > Artificial Intelligence > Vision (0.41)

Add feedback